Target Audience
Investors who plan to engage in a new restaurant business venture in New York City area who has a great concept and selection of food to offer but has concerns with going ahead with the investment due to lack of understanding of the neighborhood targeted for the start-up.Problem to Solve
Figure out the neighborhood to open the restaurant which significantly provide the best chance for the business to succeed. The concern is due to research findings by US Small Business Administration (SBA) that 30% of new business fails in the first 2 years, 50% in 5 years and 66% in 10 years. The SBA findings should make it a necessity for new business investors to include and consider location data analysis in their research when putting up a business. Investors needs to get a good insight of the neighborhood the business will operate. For a new business venture such as a Restaurant to be viable, an investor has to ensure neighborhood is located in high foot traffic area with good proximity to city center. Investor also needs to be critical to population and crime trends. To address these concerns, an exploratory data analysis have been conducted on all of the New York City boroughs using NYC Open Data containing historical crime, population and housing data. After getting a better insight of the boroughs, segmentation and clustering is conducted on its underlying neighborhood using the neighborhood’s social networking data made available by Foursquare API to figure out the ideal location for the business.Why Target Audience Would Care
Restaurant business investors will find the result of the analysis invaluable as a good insight of the neighborhood provides for them the information to anticipate and better manage some of the risks that goes along with a new business start-up.
Data Source
New York City Open Data
- NYPD Complaint Historical Data 2006 to 2017
- NYC Borough Boundaries (JSON)
New York City Planning
- NYC Borough Population 1900 to 2010
- NYC Total Housing Units 1940 to 2010
New York University Spatial Data Repository
- NYC Bourough Neighborhood Coordinates (JSON)
Google Map API
- Reverse Geo Coding
Foursquare API
- Neighborhood Venues
Data Wrangling and Transformation Approach
New York City Open Data
- NYPD Complaint Historical Data 2006 to 2017
Due to space limitation of the Cognitive Class lab environment, This 1.8GB dataset with 6.04 million rows and 35 columns had to be directly downloaded into a personal computer and loaded into a Microsoft Access DB so data required can be extracted and transformed into a format required using SQL.
Source: https://data.cityofnewyork.us/Public-Safety/NYPD-Complaint-Data-Historic/qgea-i56i- NYC Borough Boundaries (JSON)
Enconding format had to be enhanced a bit to be able to incorporate Crime Data in generating the Python Folium Choropleth Map.
Source: https://data.cityofnewyork.us/City-Government/Borough-Boundaries/tqmj-j8zmNew York City Planning
- NYC Borough Total Population 1900 to 2010
Minimal transformation required.
Source: https://www1.nyc.gov/site/planning/data-maps/nyc-population/historical-population.page- NYC Borough Total Housing Units 1940 to 2010
Minimal transformation required.
Source: https://www1.nyc.gov/site/planning/data-maps/nyc-population/historical-population.pageNew York University Spatial Data Repository
- NYC Bourough Neighborhood Coordinates (JSON)
JSON file had to be downloaded separately with several missing neighborhood needed to be added to produced unbiased result.
Source: https://geo.nyu.edu/catalog/nyu_2451_34572Google Map API
A Google cloud account was opened in order to conduct reverse geo mapping. NYPD Crime Data only provides the latitude and longitude coordinates of the reported crime. The analysis requires the neighborhood crime data be incorporated in the investors decision making.
Foursquare API
A Foursquare developer account was opened in order to take advantage of its API which provides social networking location information about venues, users, and check-ins. The API enables the research to perform k-means clustering of New York City underlying neighborhood.
The target location for the new restaurant business venture is New York City area. An exploratory analysis have been conducted of the area in order to identify neighborhoods which best provides the chance for the business to be viable. Certain aspects such as crime, housing and population trends were explored. Once specific borough location has been identified, one hot encoding is performed for the venues located in its underlying neighborhoods so k-means clustering algorithm can be employed to identify neighborhood venues similarities.
As have noted in the data section, due to space limitations and source website employing URL redirection, most data required have been down loaded into a personal computer were data wrangling and transformation have been conducted. Enhanced and transformed data sets were uploaded to the Cognitive Class labs environment for the analysis.
New York City Boroughs 2017 Reported Crime Map
New York City Historical Population Covering Period 1900 to 2010
New York City Historical Housing Units Covering Period 1940 to 2010
New York City Historical Reported Crime Covering Period 2006 to 2017
New York City 2017 Reported Crime Type Comparison
All New York City Boroughs data have been taken into account in formulating a well thought-out decision. Given the data available, this process considers Queens New York City Borough as the preferred area for the new business venture.
Further exploratory analysis is conducted on Queens underlying neighborhood 2017 reported crime data. Note that in this section, a random sample of 10% of the overall 2017 data have been accounted for due to the amount of time it takes for the reverse geo code mapping to identify the neighborhood the crime took place and also the cost that goes along with the extensive calls using the GOOGLE Map API.
Result of the exploratory analysis and k-means clustering algorithm employed to Queens New York City neighborhood venues utilizing Foursquare API is also presented in this section. With regards to the crime related result, as have noted, sampling was considered that likely efferct the results. Additional views involving these resulting data are presented in the subsequent section.
Queens Borough 2017 NYPD Reported Crime Map - Utilizing Crime Geographic Coordinates